Taiyuan
- Asia > China > Shanxi Province > Taiyuan (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Shanxi Province > Taiyuan (0.04)
- Asia > Middle East > Jordan (0.04)
LRMR: LLM-Driven Relational Multi-node Ranking for Lymph Node Metastasis Assessment in Rectal Cancer
Dong, Yaoxian, Gao, Yifan, Li, Haoyue, Cui, Yanfen, Gao, Xin
Accurate preoperative assessment of lymph node (LN) metastasis in rectal cancer guides treatment decisions, yet conventional MRI evaluation based on morphological criteria shows limited diagnostic performance. While some artificial intelligence models have been developed, they often operate as black boxes, lacking the interpretability needed for clinical trust. Moreover, these models typically evaluate nodes in isolation, overlooking the patient-level context. To address these limitations, we introduce LRMR, an LLM-Driven Relational Multi-node Ranking framework. This approach reframes the diagnostic task from a direct classification problem into a structured reasoning and ranking process. The LRMR framework operates in two stages. First, a multimodal large language model (LLM) analyzes a composite montage image of all LNs from a patient, generating a structured report that details ten distinct radiological features. Second, a text-based LLM performs pairwise comparisons of these reports between different patients, establishing a relative risk ranking based on the severity and number of adverse features. We evaluated our method on a retrospective cohort of 117 rectal cancer patients. LRMR achieved an area under the curve (AUC) of 0.7917 and an F1-score of 0.7200, outperforming a range of deep learning baselines, including ResNet50 (AUC 0.7708). Ablation studies confirmed the value of our two main contributions: removing the relational ranking stage or the structured prompting stage led to a significant performance drop, with AUCs falling to 0.6875 and 0.6458, respectively. Our work demonstrates that decoupling visual perception from cognitive reasoning through a two-stage LLM framework offers a powerful, interpretable, and effective new paradigm for assessing lymph node metastasis in rectal cancer.
Multi-level Mixture of Experts for Multimodal Entity Linking
Hu, Zhiwei, Gutiérrez-Basulto, Víctor, Xiang, Zhiliang, Li, Ru, Pan, Jeff Z.
Multimodal Entity Linking (MEL) aims to link ambiguous mentions within multimodal contexts to associated entities in a multimodal knowledge base. Existing approaches to MEL introduce multimodal interaction and fusion mechanisms to bridge the modality gap and enable multi-grained semantic matching. However, they do not address two important problems: (i) mention ambiguity, i.e., the lack of semantic content caused by the brevity and omission of key information in the mention's textual context; (ii) dynamic selection of modal content, i.e., to dynamically distinguish the importance of different parts of modal information. To mitigate these issues, we propose a Multi-level Mixture of Experts (MMoE) model for MEL. MMoE has four components: (i) the description-aware mention enhancement module leverages large language models to identify the WikiData descriptions that best match a mention, considering the mention's textual context; (ii) the multimodal feature extraction module adopts multimodal feature encoders to obtain textual and visual embeddings for both mentions and entities; (iii)-(iv) the intra-level mixture of experts and inter-level mixture of experts modules apply a switch mixture of experts mechanism to dynamically and adaptively select features from relevant regions of information. Extensive experiments demonstrate the outstanding performance of MMoE compared to the state-of-the-art. MMoE's code is available at: https://github.com/zhiweihu1103/MEL-MMoE.
- Asia > China > Shanxi Province > Taiyuan (0.04)
- Europe > United Kingdom > Wales > Cardiff (0.04)
- Europe > France > Bourgogne-Franche-Comté > Doubs > Besançon (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.86)
On-Device Training of PV Power Forecasting Models in a Smart Meter for Grid Edge Intelligence
Huang, Jian, Zhu, Yongli, Xu, Linna, Zheng, Zhe, Cui, Wenpeng, Sun, Mingyang
In this paper, an edge-side model training study is conducted on a resource-limited smart meter. The motivation of grid-edge intelligence and the concept of on-device training are introduced. Then, the technical preparation steps for on-device training are described. A case study on the task of photovoltaic power forecasting is presented, where two representative machine learning models are investigated: a gradient boosting tree model and a recurrent neural network model. To adapt to the resource-limited situation in the smart meter, "mixed"- and "reduced"-precision training schemes are also devised. Experiment results demonstrate the feasibility of economically achieving grid-edge intelligence via the existing advanced metering infrastructures.
$T^3$: Multi-level Tree-based Automatic Program Repair with Large Language Models
Liu, Quanming, Bu, Xupeng, Yan, Zhichao, Li, Ru
Automatic Program Repair (APR) is a core technology in software development and maintenance, with aims to enable automated defect repair with minimal human intervention. In recent years, the substantial advancements in Large Language Models (LLMs) and the Chain-of-Thought (CoT) techniques have significantly enhanced the reasoning capabilities of these models. However, due to the complex logic and multi-step reasoning ability needed, the application of CoT techniques in the APR domain remains insufficient. This study systematically evaluates the performance of several common CoT techniques in APR tasks and proposes an innovative framework $T^3$, which integrates the powerful reasoning capabilities of LLMs with tree search, effectively improving the precision of generating candidate repair solutions. Furthermore, $T^3$ provides valuable guidance for optimizing sample selection and repair strategies in APR tasks, establishing a robust framework for achieving efficient automated debugging.
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
D'souza, Daniel, Kreutzer, Julia, Morisot, Adrien, Üstün, Ahmet, Hooker, Sara
One of the most profound challenges of modern machine learning is performing well on the long-tail of rare and underrepresented features. Large general-purpose models are trained for many tasks, but work best on high-frequency use cases. After training, it is hard to adapt a model to perform well on specific use cases underrepresented in the training corpus. Relying on prompt engineering or few-shot examples to maximize the output quality on a particular test case can be frustrating, as models can be highly sensitive to small changes, react in unpredicted ways or rely on a fixed system prompt for maintaining performance. In this work, we ask: "Can we optimize our training protocols to both improve controllability and performance on underrepresented use cases at inference time?" We revisit the divide between training and inference techniques to improve long-tail performance while providing users with a set of control levers the model is trained to be responsive to. We create a detailed taxonomy of data characteristics and task provenance to explicitly control generation attributes and implicitly condition generations at inference time. We fine-tune a base model to infer these markers automatically, which makes them optional at inference time. This principled and flexible approach yields pronounced improvements in performance, especially on examples from the long tail of the training distribution. While we observe an average lift of 5.7% win rates in open-ended generation quality with our markers, we see over 9.1% gains in underrepresented domains. We also observe relative lifts of up to 14.1% on underrepresented tasks like CodeRepair and absolute improvements of 35.3% on length instruction following evaluations.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (20 more...)
- Health & Medicine (1.00)
- Information Technology (0.93)
- Law (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Advanced Deep Learning Approaches for Automated Recognition of Cuneiform Symbols
Elshehaby, Shahad, Panthakkan, Alavikunhu, Al-Ahmad, Hussain, Al-Saad, Mina
Advanced Deep Learning Approaches for Automated Recognition of Cuneiform Symbols 1 st Shahad Elshehaby College of Engineering and IT University of Dubai Dubai, United Arab Emirates s0000002884@ud.ac.ae 2 nd Alavikunhu Panthakkan College of Engineering and IT University of Dubai Dubai, United Arab Emirates apanthakkan@ud.ac.ae 3 rd Hussain Al-Ahmad College of Engineering and IT University of Dubai Dubai, United Arab Emirates halahmad@ud.ac.ae 4 th Mina Al-Saad College of Engineering and IT University of Dubai Dubai, United Arab Emirates malsaad@ud.ac.ae Abstract --This paper presents a thoroughly automated method for identifying and interpreting cuneiform characters via advanced deep-learning algorithms. Five distinct deep-learning models were trained on a comprehensive dataset of cuneiform characters and evaluated according to critical performance metrics, including accuracy and precision. Two models demonstrated outstanding performance and were subsequently assessed using cuneiform symbols from the Hammurabi law acquisition, notably Hammurabi Law 1. Each model effectively recognized the relevant Akkadian meanings of the symbols and delivered precise English translations. Future work will investigate ensemble and stacking approaches to optimize performance, utilizing hybrid architectures to improve detection accuracy and reliability.
- Asia > Middle East > UAE > Dubai Emirate > Dubai (1.00)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.05)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (6 more...)
A Predictive Services Architecture for Efficient Airspace Operations
de Oliveira, Ítalo Romani, Ayhan, Samet, Balvedi, Glaucia, Biglin, Michael, Costas, Pablo, Neto, Euclides C. Pinto, Leite, Alexandre, de Azevedo, Felipe C. F.
Predicting air traffic congestion and flow management is essential for airlines and Air Navigation Service Providers (ANSP) to enhance operational efficiency. Accurate estimates of future airport capacity and airspace density are vital for better airspace management, reducing air traffic controller workload and fuel consumption, ultimately promoting sustainable aviation. While existing literature has addressed these challenges, data management and query processing remain complex due to the vast volume of high-rate air traffic data. Many analytics use cases require a common pre-processing infrastructure, as ad-hoc approaches are insufficient. Additionally, linear prediction models often fall short, necessitating more advanced techniques. This paper presents a data processing and predictive services architecture that ingests large, uncorrelated, and noisy streaming data to forecast future airspace system states. The system continuously collects raw data, periodically compresses it, and stores it in NoSQL databases for efficient query processing. For prediction, the system learns from historical traffic by extracting key features such as airport arrival and departure events, sector boundary crossings, weather parameters, and other air traffic data. These features are input into various regression models, including linear, non-linear, and ensemble models, with the best-performing model selected for predictions. We evaluate this infrastructure across three prediction use cases in the US National Airspace System (NAS) and a segment of European airspace, using extensive real operations data, confirming that our system can predict future system states efficiently and accurately.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Florida > Orange County > Orlando (0.05)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- (13 more...)
- Transportation > Air (1.00)
- Transportation > Infrastructure & Services > Airport (0.53)
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Yue, Zhengrong, Zhuang, Shaobin, Li, Kunchang, Ding, Yanbo, Wang, Yali
Despite the recent advancement in video stylization, most existing methods struggle to render any video with complex transitions, based on an open style description of user query. To fill this gap, we introduce a generic multi-agent system for video stylization, V-Stylist, by a novel collaboration and reflection paradigm of multi-modal large language models. Specifically, our V-Stylist is a systematical workflow with three key roles: (1) Video Parser decomposes the input video into a number of shots and generates their text prompts of key shot content. Via a concise video-to-shot prompting paradigm, it allows our V-Stylist to effectively handle videos with complex transitions. (2) Style Parser identifies the style in the user query and progressively search the matched style model from a style tree. Via a robust tree-of-thought searching paradigm, it allows our V-Stylist to precisely specify vague style preference in the open user query. (3) Style Artist leverages the matched model to render all the video shots into the required style. Via a novel multi-round self-reflection paradigm, it allows our V-Stylist to adaptively adjust detail control, according to the style requirement. With such a distinct design of mimicking human professionals, our V-Stylist achieves a major breakthrough over the primary challenges for effective and automatic video stylization. Moreover,we further construct a new benchmark Text-driven Video Stylization Benchmark (TVSBench), which fills the gap to assess stylization of complex videos on open user queries. Extensive experiments show that, V-Stylist achieves the state-of-the-art, e.g.,V-Stylist surpasses FRESCO and ControlVideo by 6.05% and 4.51% respectively in overall average metrics, marking a significant advance in video stylization.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Europe > Germany > Lower Saxony > Hanover (0.04)
- Asia > China > Shanxi Province > Taiyuan (0.04)
- Research Report > New Finding (0.48)
- Research Report > Promising Solution (0.34)